A comprehensive analysis intended for Air services Australia of International Air-Traffic within Australia to gain insights into current and future demands in the Australian aviation industry for a data-driven recommendation for implementation of resources and ventures of their corporation
Qantas is the only government-owned dominant competitor in domestic and international aviation. Its large frequency of flights taken for regional carriers and its government-funding provide it stability in market. The number of incoming flights has increased by integer multiple by 1.09. Air Services Australia is recommended to expect increase in incoming flight-traffic and demand for volume management and future international ventures.
Data sourced from The Bureau of Infrastructure, Transport and Regional Economics (BITRE) is preliminarily prepared data collected through various third-party international passenger airlines in operational status at publication period. Flights dataset entails flight characteristics data of scheduled international passenger airlines operating to and from Australia. Data not supplied before publication of dataset is inferred to on basis of pre-existing schedules or estimated on previous reports. Seat allocation is shared between connection flights, where interchanges occur seat count based on individual routes in ‘Stops’ variable in dataset. Data overstates available capacity to/from the diversion ports where purchases are not possible. Ethically collected on open means, where legibility is assessed on source trustability and whether data corresponds with existing schedules and flight specifications (previously considered by BITRE). Ethical purpose being data exclusive of passenger data, with primary intentions solely on business intelligence and analytics.
# Read in your data
## International Airlines operating from Australia
flights = read.csv("C:\\Users\\User\\Desktop\\Data1001\\Project 3\\flights.csv")
library(tidyverse)
library(ggplot2)
library(plotly)
# Glimpse of data and data types
str(flights)
## 'data.frame': 89312 obs. of 15 variables:
## $ Month : int 37865 37865 37865 37865 37865 37865 37865 37865 37865 37865 ...
## $ In_Out : Factor w/ 2 levels "I","O": 1 1 1 1 1 1 1 1 1 1 ...
## $ Australian_City : Factor w/ 14 levels "Adelaide","Brisbane",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ International_City: Factor w/ 99 levels "Abu Dhabi","Apia",..: 17 32 40 83 83 83 3 3 3 3 ...
## $ Airline : Factor w/ 84 levels "Aerolineas Argentinas",..: 37 25 52 60 60 68 9 36 37 52 ...
## $ Route : Factor w/ 1152 levels "ADL-AKL","ADL-CAN",..: 259 335 396 820 821 800 16 16 19 21 ...
## $ Port_Country : Factor w/ 46 levels "Argentina","Austria",..: 16 14 21 32 32 32 26 26 26 26 ...
## $ Port_Region : Factor w/ 10 levels "Africa","Europe",..: 10 6 10 10 10 10 7 7 7 7 ...
## $ Service_Country : Factor w/ 44 levels "Argentina","Austria",..: 16 14 21 30 30 30 25 25 25 25 ...
## $ Service_Region : Factor w/ 10 levels "Africa","Europe",..: 10 6 10 10 10 10 7 7 7 7 ...
## $ Stops : int 0 0 0 1 1 0 0 0 0 0 ...
## $ All_Flights : int 13 8 17 4 9 12 36 18 8 14 ...
## $ Max_Seats : int 3809 2008 4726 908 2038 3876 12624 2556 2296 5404 ...
## $ Year : int 2003 2003 2003 2003 2003 2003 2003 2003 2003 2003 ...
## $ Month_num : int 9 9 9 9 9 9 9 9 9 9 ...
The 10 top and lower brackets of the populous airline organisations
library(ggplot2)
library(plotly)
library(ggpubr)
library(knitr)
library(kableExtra)
x= sort(table(flights$Airline),decreasing = T)
topx = head(x,10) #Data Wrangling/subsetting
lowerx = tail(x,10)
kable(list(topx,lowerx
),
caption = 'Top 10 and Lower 10.',
booktabs = TRUE, valign = 't'
)
|
|
# tabluar presentation
Majority of flights operated through Qantas, JetStar and Virgin Australia followed by international organisations separate from government engagement. These include Emirates and Air New Zealand, contributing to the economy these ventures are not directly under resource responsibilities of Air Services Australia
#flight frequency barplot
p1 = ggplot(flights, aes(x=factor(Airline)) ) +geom_bar(width = 0.8,color='darkblue',fill = "#ADD8E6") + ggtitle("Number of Flights of Airline") + xlab("Airline") + ylab("Frequency") + theme_bw()+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+theme(axis.text.x= element_text(size = 6))
ggplotly(p1)
In and out operating flights concerned with major cities (Sydney, Melbourne and Brisbane) intentions of tourist, education, career and economic engagement.
#demand per major city
p2 = ggplot(flights, aes(x=factor(Australian_City)) ) +geom_bar(width = 0.8,color='green',fill = "#98FB98") + ggtitle("Routes for City") + xlab("City") + ylab("Frequency") + theme_bw()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+theme(axis.text.x= element_text(size = 9))
ggplotly(p2)
Despite range of frequency of flights of Qantas and other regional airlines. International airlines with significant difference in passenger capacity with accordance to longer travel and expense. Expected on-going maintenance and demand for regular air traffic monitoring for domestic flights by Air Services than of international due to frequency.
#the top 10 airline vs its average seats on offer
quantas_data <- flights[ which(flights$Airline=='Qantas Airways'
& flights$Max_Seats > 0), ]
quantas_avg_seats = sum(quantas_data$Max_Seats)/length(quantas_data$Max_Seats)
jetstar_data <- flights[ which(flights$Airline=='Jetstar'
& flights$Max_Seats > 0), ]
jetstar_avg_seats = sum(jetstar_data$Max_Seats)/length(jetstar_data$Max_Seats)
virgin_data <- flights[ which(flights$Airline=='Virgin Australia'
& flights$Max_Seats > 0), ]
virgin_avg_seats = sum(virgin_data$Max_Seats)/length(virgin_data$Max_Seats)
Air_Newzealand_data <- flights[ which(flights$Airline=='Air New Zealand'
& flights$Max_Seats > 0), ]
Air_Newzealand_avg_seats = sum(Air_Newzealand_data$Max_Seats)/length(Air_Newzealand_data$Max_Seats)
emirates_data <- flights[ which(flights$Airline=='Emirates'
& flights$Max_Seats > 0), ]
emirates_avg_seats = sum(emirates_data$Max_Seats)/length(emirates_data$Max_Seats)
cathay_pacific_airways_data <- flights[ which(flights$Airline=='Cathay Pacific Airways'
& flights$Max_Seats > 0), ]
cathay_pacific_airways_avg_seats = sum(cathay_pacific_airways_data$Max_Seats)/length(cathay_pacific_airways_data$Max_Seats)
garuda_indonesia_data <- flights[ which(flights$Airline=='Garuda Indonesia'
& flights$Max_Seats > 0), ]
garuda_indonesia_avg_seats = sum(garuda_indonesia_data$Max_Seats)/length(garuda_indonesia_data$Max_Seats)
malaysia_airlines_data <- flights[ which(flights$Airline=='Malaysia Airlines'
& flights$Max_Seats > 0), ]
malaysia_airlines_avg_seats = sum(malaysia_airlines_data$Max_Seats)/length(malaysia_airlines_data$Max_Seats)
united_airlines_data <- flights[ which(flights$Airline=='United Airlines'
& flights$Max_Seats > 0), ]
united_airlines_avg_seats = sum(united_airlines_data$Max_Seats)/length(united_airlines_data$Max_Seats)
thai_airways_international_data <- flights[ which(flights$Airline=='Thai Airways International'
& flights$Max_Seats > 0), ]
thai_airways_international_avg_seats = sum(thai_airways_international_data$Max_Seats)/length(thai_airways_international_data$Max_Seats)
#Data frame for average seats fro top 10 airlines
seats = data.frame("Airlines" = c("Quantas","Jetstar","Virgin Australia",'Air New Zealand','Emirates','Cathay Pacific Airways','Garuda Indonesia',"Malaysia Airlines",'United Airlines','Thai Airways International'),"Avg_seats" = c(quantas_avg_seats,jetstar_avg_seats,virgin_avg_seats,Air_Newzealand_avg_seats,emirates_avg_seats,cathay_pacific_airways_avg_seats,garuda_indonesia_avg_seats,malaysia_airlines_avg_seats,united_airlines_avg_seats,thai_airways_international_avg_seats))
#Airline vs Avg.seats barplot
p3 <- ggplot(data=seats, aes(x=Airlines, y=Avg_seats)) +
geom_bar(stat="identity",width = 0.8,colour = "black",fill ="#FFFF99" )+ ggtitle("Avg seats offerred by Airline") + xlab("Airline") + ylab("Seats")+ theme_bw()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplotly(p3)
In and out operating flights within 2013 to 2018 bracket present fluctuance in flight frequency with visible correlation (0.9851574) between incoming and outgoing flights per year.
library(cowplot)
#out per year
numo = length(quantas_data$In_Out[quantas_data$In_Out == "O"&quantas_data$Year == 2013])
numo2 = length(quantas_data$In_Out[quantas_data$In_Out == "O"&quantas_data$Year == 2014])
numo3 = length(quantas_data$In_Out[quantas_data$In_Out == "O"&quantas_data$Year == 2015])
numo4 = length(quantas_data$In_Out[quantas_data$In_Out == "O"&quantas_data$Year == 2016])
numo5 = length(quantas_data$In_Out[quantas_data$In_Out == "O"&quantas_data$Year == 2017])
numo6 = length(quantas_data$In_Out[quantas_data$In_Out == "O"&quantas_data$Year == 2018])
#in per year
numi = length(quantas_data$In_Out[quantas_data$In_Out == "I"&quantas_data$Year == 2013])
numi2 = length(quantas_data$In_Out[quantas_data$In_Out == "I"&quantas_data$Year == 2014])
numi3 = length(quantas_data$In_Out[quantas_data$In_Out == "I"&quantas_data$Year == 2015])
numi4 = length(quantas_data$In_Out[quantas_data$In_Out == "I"&quantas_data$Year == 2016])
numi5 = length(quantas_data$In_Out[quantas_data$In_Out == "I"&quantas_data$Year == 2017])
numi6 = length(quantas_data$In_Out[quantas_data$In_Out == "I"&quantas_data$Year == 2018])
quantas_in_out_data = data.frame("Year" = c("2013","2014","2015","2016","2017","2018"), "In" = c(numi,numi2,numi3,numi4,numi5,numi6),"Out" = c(numo,numo2,numo3,numo4,numo5,numo6))
#Airline incoming barplot
p4 <- ggplot(data=quantas_in_out_data, aes(x=Year, y=In)) +
geom_bar(position = 'dodge',stat="identity",width = 0.8,colour = "black",fill ="#00FFFF" )+ ggtitle("In to Australia") + xlab("Year") + ylab("Flights")+ theme_bw()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
geom_text(aes(label=In), position=position_dodge(width=0.9), vjust=-0.37)
#Airline out-going barplot
p5 <- ggplot(data=quantas_in_out_data, aes(x=Year, y=Out)) +
geom_bar(position = 'dodge',stat="identity",width = 0.8,colour = "black",fill ="#87ceeb" )+ ggtitle("Out of Australia") + xlab("Year") + ylab("Flights")+ theme_bw()+ theme(axis.text.x = element_text(angle = 60, hjust = 1))+
geom_text(aes(label=Out), position=position_dodge(width=0.9), vjust=-0.37)
plot_grid(p4, p5, labels = "AUTO") #aligning next to each other
With sharp decrease in 2018 likely for the lack of pilots and skilled maintenance engineers. (Australian Aviation, July 27, 2018)
summary(quantas_in_out_data)
## Year In Out
## 2013:1 Min. :139.0 Min. :139.0
## 2014:1 1st Qu.:494.8 1st Qu.:452.5
## 2015:1 Median :524.0 Median :477.5
## 2016:1 Mean :470.3 Mean :429.8
## 2017:1 3rd Qu.:548.8 3rd Qu.:504.0
## 2018:1 Max. :593.0 Max. :528.0
cor(quantas_in_out_data$In,quantas_in_out_data$Out) #correlation between both variables
## [1] 0.9851574
mean_In = mean(quantas_in_out_data$In)
mean_Out = mean(quantas_in_out_data$Out)
sd_In = sd(quantas_in_out_data$In)
sd_Out = sd(quantas_in_out_data$Out)
simulated_In = trunc(rnorm(1000,mean_In,sd_In))
simulated_Out = trunc(rnorm(1000,mean_Out,sd_Out))
cor(simulated_In,simulated_Out) # correlation with simulation
## [1] 0.01485244
simulated_data = data.frame(simulated_In,simulated_Out)
#linear model
p6 = ggplot(simulated_data, aes(simulated_In,simulated_data$simulated_Out)) + geom_point(shape=18,na.rm = TRUE)+ xlim(0,1000)+
geom_smooth(method=lm)+ggtitle("In vs Out")+ xlab("In")+ ylab("Out")
ggplotly(p6)
#residual model
l=lm(simulated_Out~simulated_In)
plot(simulated_In,l$residuals,xlab = "Simulated_In", ylab="Residuals", xlim=c(0,1000),ylim=c(-200,500))
abline(h = 0, col = "blue")
Strong cluster of scattered points presents heteroscadestic relationship as variance about the mean indicates consistent relationship along the linear line at 400±40 flights on average across the sample size.
For a simulated 1000 passengers the resulting sample sum that come In to Australia in 10 years.
library(multicon)
#modelling
# simulate in and out for next 10 yearsev = meanbox
box=c(1,0) # Define box - 1 == "In" & 0 == "Out
n = 10 # 10 years
meanbox=mean(box)
sdbox=popsd(box)
ev = meanbox
se = sdbox/sqrt(n)
#between 2013 to 2018
numof_O = 2549
numof_i = 2822
percentof_o = numof_O/5371
percentof_i = numof_i/5371
# percentof_o
# percentof_i
totals = replicate(1000, sum(sample(box, 10, prob=c(percentof_o,percentof_i), rep = T)))
hist(totals,prob=T)
m=mean(totals)
s=sd(totals)
curve(dnorm(x, mean=m, sd=s),
col="darkblue", lwd=2, add=TRUE)
c(ev - 2*se,ev + 2*se) #95% confidence interval
## [1] 0.1837722 0.8162278
Null hypothesis being mean number of flights In same as Out? Is there variance between In and Out? Assuming the number of flights is random and each is independent, also the population is of normal distribution from previous modelling. Since p-value is > 0.05 and difference in mean being 40.167. The null hypothesis is rejected and the alternative hypothesis is retained that variance in difference of mean exists.
# Paired T- Test (population SD is unknown, small sample size.)
In = quantas_in_out_data$In
Out = quantas_in_out_data$Out
diff = abs(In-Out)
mean(diff)
## [1] 40.5
sd(diff)
## [1] 33.89248
t.test(diff, mu = 0)
##
## One Sample t-test
##
## data: diff
## t = 2.927, df = 5, p-value = 0.03275
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 4.932027 76.067973
## sample estimates:
## mean of x
## 40.5
2*pnorm(2.927, lower.tail=F)
## [1] 0.003422489
The resulting insights are solely based on the legibility of collected dataset and simulated results act as only an estimate of future behaviour. Research paper by John Kain and Richard Webb states as shown previously international airlines “…control airline seat capacity deployed on scheduled services over flight routes. The frequency of flight numbers is supported as, “Qantas Airways Group has extensive commercial ownership with regional carriers alongside Virgin Blue”. Passenger intents solely based on “…business sector and inbound tourism.”. An article by Financial Review, highlights “Virgin Australia and Qantas dominate the Australian domestic market”, while Quantas itself is the only international competitor of Australia while others struggle.
Style: Harvard
In-text(John Kain and Richard Webb ,16 June 2003 )
Reasearch paper - Turbulent Times: Australian Airline Industry Issues 2003
Style: Harvard
In-text(Financial Review, Jun 28, 2017)
Reasearch article - Australian aviation not for the faint-hearted
https://www.afr.com/chanticleer/australian-aviation-not-for-the-fainthearted-20170628-gx0dms
Style: Harvard
In-text (Australian Aviation, July 27, 2018)
Reasearch article - Report Says Australia Suffering Severe Shortage Of Pilots And Engineers
Style: Harvard
Purpose : Dataset Notes (Creative Commons Attribution 3.0 Australia,01/05/2020)
External packages used | ggplot2 | plotly | ggpubr | multicon | cowplot | knitr | kableExtra |